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ABSTRACT 

Data mining application includes a variety of 
methodologies that have been developed by 
commercial & research centers. This technique has 
been used for industrial, commercial and scientific 
purposes. It is most useful in an exploratory analysis 
scenario in which there are no prearranged notions 
about what will compose an "interesting" outcome. 
The WEKA contains a set of visualization tools & 
algorithms for data analysis and predictive modeling, 
together with graphical user interfaces for simple 
access to this functionality. 
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INTRODUCTION 

The Data mining software application includes 
various methodologies that have been developed by 
both commercial and research centers. These 
techniques have been used for industrial, commercial 
and scientific purposes. For example, data mining has 
been used to analyze large datasets and establish a 
useful classification and patterns in the data sets. 
Agricultural, medical, education and biological 
research studies have used various techniques of data 
analysis, including, natural trees, statistical machine 


Mr Surender Singh 

Assistant Professor 

Om Institute of Technology & Management, Hisar 


learning, classification, clustering and other analysis 
methods. The main objectives of our work are to 
investigate the performance analysis of different 
classification and clustering methods using the 
WEKA software for education dataset. In this thesis, 
we present the comparison of different classification 
and clustering techniques using Waikato Environment 
for the knowledge Analysis or in short WEKA and 
developed at the University of Waikato. The study in 
this thesis will focus on the use of data mining 
techniques or pervious analyzed data set. The data 
mining tool WEKA will be used. WEKA is the free 
software available under the GNU general public 
license. WEKA is the open source software which 
consists of a collection of machine learning 
algorithms for data mining tasks. [3] 

Data mining is the process of finding of hidden 
information from a huge amount of data. Data mining 
analyzing the data from different source and convert it 
into meaningful information. Data mining is a new 
powerful technology that helps business to focus on 
important information like future trends, decision 
making, customer choice etc. A target dataset is 
prepared before applying the data mining algorithm. 
The common source of data is the data warehouse. 
Pre-processing is needed to analyze the data sets 
before applying the data mining. [2] 
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Figure 1 : Knowledge Discovery Process (KDP) 


Knowledge Discovery in Databases is the non-trivial 
process of identifying valid, novel, potentially useful, 
and ultimately understandable patterns in data. 
According to this definition, data is a set of facts that 
is somehow accessible in electronic form. The term 
“patterns” indicates models and regularities which can 
be observed within the data. Patterns have to be valid, 
i.e. they should be true on new data with some degree 
of certainty. A novel pattern is not previously known 
or trivially true. The potentially usefulness of patterns 
refers to the possibility that they lead to an action 
providing a benefit. A pattern is understandable if it is 
interpretable by a human user. At last KDD is a 
process, indicating that there are several steps that are 
repeated in several iterations. [5] 

WEKA: A DATA MINING SOFTWARE 

Weka (Waikato Environment for Knowledge 
Analysis) is a popular suite of machine 
learning software written in Java, developed at the 
University of Waikato, New Zealand. Weka is free 
software available under the GNU General Public 
License. The Weka workbench contains a collection 
of visualization tools and algorithms for data 
analysis and predictive modeling, together with 
graphical user interfaces for easy access to this 
functionality. This original version was primarily 
designed as a tool for analyzing data from agricultural 
domains, but the more recent fully Java-based version 
(Weka), is now used in many different application 
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areas, in particular for educational purposes and 
research. [6] 
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Figure 2 : Weka User Interface 

CLASSIFICATION & CLUSTERING 
TECHNIQUES 

Classification 

Classification is the most commonly applied data 
mining technique, which employs a set of pre¬ 
classified examples to develop a model that can 
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classify the population of records at large. Fraud 
detection and credit risk applications are particularly 
well suited to this type of analysis. This approach 
frequently employs decision tree or neural network- 
based classification algorithms. The data classification 
process involves learning and classification. In 
Learning the training data are analyzed by 
classification algorithm. [4] 


Types of classification models: 

1. Classification by decision tree induction 

2. Bayesian Classification 

3. Neural Networks 

4. Support Vector Machines (SVM) 

5. Classification Based on Associations 
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Figure 3: Example of Classification Technique 


Clustering 

Clustering can be said as identification of similar 
classes of objects. By using clustering techniques we 
can further identify dense and sparse regions in object 
space and can discover overall distribution pattern and 
correlations among data attributes. [7] 

Types of clustering methods: 

1. Partitioning Methods 

2. Hierarchical Agglomerative (divisive) 
methods 

3. Density based methods 

4. Farthest First method 

5. Grid-based methods 

6. Model-based methods 


Clustering 



Figure 4: Example of Clustering 

CONCLUSION 


Data Mining is most useful in an exploratory analysis 
scenario in which there are no predetermined notions 
about what will constitute an "interesting" outcome. 
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In practice, the two primary goals of Data Mining 
tend to be prediction and description. Prediction 
involves using some variables or fields in the data set 
to predict unknown or future values of other variables 
of interest. Data Mining (DM) represents a set of 
specific methods and algorithms aimed solely at 
extracting patterns from raw data. Data mining 
sometimes is also called knowledge discovery in 
databases (KDD). Knowledge Discovery in Databases 
(KDD) is an automatic, exploratory analysis and 
modeling of large data repositories. KDD is the 
organized process of identifying valid, novel, useful, 
and understandable patterns from large and complex 
data sets. We can also find the existing relationships 
and patterns. Data mining combines machine learning, 
statistics and visualization techniques to discover and 
extract knowledge. 
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